184 research outputs found
A Comparative Analysis of Ensemble Classifiers: Case Studies in Genomics
The combination of multiple classifiers using ensemble methods is
increasingly important for making progress in a variety of difficult prediction
problems. We present a comparative analysis of several ensemble methods through
two case studies in genomics, namely the prediction of genetic interactions and
protein functions, to demonstrate their efficacy on real-world datasets and
draw useful conclusions about their behavior. These methods include simple
aggregation, meta-learning, cluster-based meta-learning, and ensemble selection
using heterogeneous classifiers trained on resampled data to improve the
diversity of their predictions. We present a detailed analysis of these methods
across 4 genomics datasets and find the best of these methods offer
statistically significant improvements over the state of the art in their
respective domains. In addition, we establish a novel connection between
ensemble selection and meta-learning, demonstrating how both of these disparate
methods establish a balance between ensemble diversity and performance.Comment: 10 pages, 3 figures, 8 tables, to appear in Proceedings of the 2013
International Conference on Data Minin
Structural Drift: The Population Dynamics of Sequential Learning
We introduce a theory of sequential causal inference in which learners in a
chain estimate a structural model from their upstream teacher and then pass
samples from the model to their downstream student. It extends the population
dynamics of genetic drift, recasting Kimura's selectively neutral theory as a
special case of a generalized drift process using structured populations with
memory. We examine the diffusion and fixation properties of several drift
processes and propose applications to learning, inference, and evolution. We
also demonstrate how the organization of drift process space controls fidelity,
facilitates innovations, and leads to information loss in sequential learning
with and without memory.Comment: 15 pages, 9 figures;
http://csc.ucdavis.edu/~cmg/compmech/pubs/sdrift.ht
Genetic Algorithm Amplifier Biasing System (GAABS): Genetic Algorithm for Biasing on Differential Analog Amplifiers
Genetic Algorithm Amplifier Biasing System (GAABS) - Senior Project Analysis
Summary of Functional Requirements
This project integrates LTSpice with a python script that runs a genetic algorithm to bias a differential amplifier. The system biases the amplifier with 2 different voltages, the base voltage for the PNP BJTs of the active loads and a voltage controlling the current of the current sink. The project runs via a python script, gets data from LTSpice’s command line call, and iteratively runs until the system is biased to achieve the greatest gain on an arbitrary input voltage.
Primary Constraints
Some of the main challenges associated with this project are going to be the getting the genetic algorithm to work consistently and getting LTSpice to integrate well with command line. The genetic algorithm, though controlled, will have a good deal of randomness involved with converging to a certain gain value. A strong genetic algorithm should be able to converge to the same value every time and should be designed accordingly. Having never experienced using LTSpice via command line, but it shouldn’t be too difficult to call. Collecting data from the simulation will be challenging, but ideally there would be resources for help on that portion.
Economic
The original estimated cost for components is 0
Python 2.7
0
Total
0 as anticipated. Everything that could be downloaded was free to download.
The original time for development at the start of the project was anticipated being 100+ hours. Given the need to integrate everything and work to get the genetic algorithm working well, 100 hours seemed reasonable. In the end, it did end up taking roughly 80 hours. Having to try different approaches to the problem took up a lot of time and tweaking the genetic algorithm (and running the tests) took a long time, but the integration was easy to set up. The integration being easy shaved a large chunk of time off the projected time to complete the project.
Manufacturing Information
This code is open source on GitHub, and won’t be manufactured on a commercial basis.
Environmental
There are no environmental impacts associated with manufacturing. The only potential impact on the environment of this project would be the heat generated by a computer running the script. The script takes up to 30+ minutes to run, and it is somewhat intensive in terms of computing power; this would generate heat from the computer running it, and heat from computers cannot be neglected in terms of their effect on global warming. However, the heat that would be generated by 1 computer should be considered negligible, as there are much greater contributors.
Manufacturability
As stated before, there are no issue with manufacturing this project because it’s open source. Everything needed to run the code can be found online for free download, and the script can be taken from online.
Sustainability
The code runs on Python 2.7 and the current version of LTSpice. It should have no issue running on later versions of Python and LTSpice, so long as there are no drastic changes. The project is on the internet, and so it will be sustainably existing as long as it’s not taken down by GitHub. Upgrades that would improve the design of the project include running more children per generation in simulation at once to speed up runtime and taking more generations to come to the best bias voltages to make it more accurate.
Ethical
There is no ethical implication to the use or design of this project.
Health and Safety
Other than long term computer use’s impact on a user, there are no health and safety concerns with this project whatsoever.
Social and Political
There are no social and political implications to the use or design of this project.
Development
During the development of this project, I had to learn how to use Python on a much deeper level. My CPE 101 class was in Python, but that was winter quarter of 2015, and this project took place in the winter and spring of 2018. I remembered very little, but I got to see a lot of the functionality of python in terms of it being a great language for running scripts to work on a variety of applications across platforms. I had to research a lot on genetic algorithms and how to implement them, as that was a huge portion of this project
Recommended from our members
The Epstein-Barr Virus Episome Maneuvers between Nuclear Chromatin Compartments during Reactivation.
The human genome is structurally organized in three-dimensional space to facilitate functional partitioning of transcription. We learned that the latent episome of the human Epstein-Barr virus (EBV) preferentially associates with gene-poor chromosomes and avoids gene-rich chromosomes. Kaposi's sarcoma-associated herpesvirus behaves similarly, but human papillomavirus does not. Contacts on the EBV side localize to OriP, the latent origin of replication. This genetic element and the EBNA1 protein that binds there are sufficient to reconstitute chromosome association preferences of the entire episome. Contacts on the human side localize to gene-poor and AT-rich regions of chromatin distant from transcription start sites. Upon reactivation from latency, however, the episome moves away from repressive heterochromatin and toward active euchromatin. Our work adds three-dimensional relocalization to the molecular events that occur during reactivation. Involvement of myriad interchromosomal associations also suggests a role for this type of long-range association in gene regulation.IMPORTANCE The human genome is structurally organized in three-dimensional space, and this structure functionally affects transcriptional activity. We set out to investigate whether a double-stranded DNA virus, Epstein-Barr virus (EBV), uses mechanisms similar to those of the human genome to regulate transcription. We found that the EBV genome associates with repressive compartments of the nucleus during latency and with active compartments during reactivation. This study advances our knowledge of the EBV life cycle, adding three-dimensional relocalization as a novel component to the molecular events that occur during reactivation. Furthermore, the data add to our understanding of nuclear compartments, showing that disperse interchromosomal interactions may be important for regulating transcription
Observability and Controllability of Nonlinear Networks: The Role of Symmetry
Observability and controllability are essential concepts to the design of
predictive observer models and feedback controllers of networked systems. For
example, noncontrollable mathematical models of real systems have subspaces
that influence model behavior, but cannot be controlled by an input. Such
subspaces can be difficult to determine in complex nonlinear networks. Since
almost all of the present theory was developed for linear networks without
symmetries, here we present a numerical and group representational framework,
to quantify the observability and controllability of nonlinear networks with
explicit symmetries that shows the connection between symmetries and nonlinear
measures of observability and controllability. We numerically observe and
theoretically predict that not all symmetries have the same effect on network
observation and control. Our analysis shows that the presence of symmetry in a
network may decrease observability and controllability, although networks
containing only rotational symmetries remain controllable and observable. These
results alter our view of the nature of observability and controllability in
complex networks, change our understanding of structural controllability, and
affect the design of mathematical models to observe and control such networks.Comment: 19 pages, 9 figure
Model Aggregation for Distributed Content Anomaly Detection
Cloud computing offers a scalable, low-cost, and resilient platform for critical applications. Securing these applications against attacks targeting unknown vulnerabilities is an unsolved challenge. Network anomaly detection addresses such zero-day attacks by modeling attributes of attack-free application traffic and raising alerts when new traffic deviates from this model. Content anomaly detection (CAD) is a variant of this approach that models the payloads of such traffic instead of higher level attributes. Zero-day attacks then appear as outliers to properly trained CAD sensors. In the past, CAD was unsuited to cloud environments due to the relative overhead of content inspection and the dynamic routing of content paths to geographically diverse sites. We challenge this notion and introduce new methods for efficiently aggregating content models to enable scalable CAD in dynamically-pathed environments such as the cloud. These methods eliminate the need to exchange raw content, drastically reduce network and CPU overhead, and offer varying levels of content privacy. We perform a comparative analysis of our methods using Random Forest, Logistic Regression, and Bloom Filter-based classifiers for operation in the cloud or other distributed settings such as wireless sensor networks. We find that content model aggregation offers statistically significant improvements over non-aggregate models with minimal overhead, and that distributed and non-distributed CAD have statistically indistinguishable performance. Thus, these methods enable the practical deployment of accurate CAD sensors in a distributed attack detection infrastructure
Improving Critical Speed Calculations Using Flexible Bearing Support FRF Compliance Data.
LecturePg. 69-78The importance of including flexible supports in rotordynamic analyses is discussed. Various methods of including the support in rotordynamic calculations are reviewed. A method is described in which actual compliance frequency response function, FRF, data are used directly in a rotordynamic forced response computer program to accurately predict a steam turbine rotor's critical speed. The flexible support model is described as two single degree of freedom, SDOF, spring-mass-damper systems per bearing support. The methodology of acquiring the FRF data via impact hammer testing is described, and the equations are summarized that incorporate the FRF data into the flexible support model. Three flexible support models of increasing sophistication are used to analytically predict the rotor and support resonances. These results are compared to the actual steam turbine speed-amplitude plots. Modelling the support as many speed dependent SDOF systems accurately predicts the location of the rotor's first critical speed and also the split critical peaks and several support resonance speeds
- …